Invited Abstract: Ricardo Baez-Yates

نویسنده

Ricardo Baeza-Yates

چکیده

In the dynamic ocean of web data, where we have over 200 million websites, web search engines are the primary way to access content. As the data is on the order of petabytes, current search engines are very large centralized systems based on replicated clusters, where easily more than 100 billion web pages are indexed. On the other hand, Internet users are above two billion and hundreds of million of queries are issued each day. In the near future, centralized systems are likely to become less effective against such a data-query load, thus suggesting the need of fully distributed search engines. Such engines need to maintain high quality answers, fast response time, high query throughput, high availability and scalability; in spite of network latency and scattered data. In this talk we present the main challenges behind the design of a distributed web retrieval system and our research in all the components of a search engine: crawling, indexing, and query processing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Diseñemos Todo de Nuevo: Reflexiones sobre la Computación y su Enseñanza (Invited paper)

What and how to teach are the fundamental questions in our activities as lecturers. This paper presents my view on these questions related to computer science, and illustrates a critical and constructive analysis and its implications in the education, including two partial answers to these questions. REVISTA COLOMBIANA DE COMPUTACIÓN Volumen 1, número 1 Págs. 7-28 Ricardo Baeza Yates 2

متن کامل

The Web as a Semantic Source

In this extended abstract we describe several semantic sources that can be found in the Web that are either explicit, e.g. Wikipedia, or implicit, e.g. derived from Web data. We also show how we are using them to improve search and to generate new semantic resources, as our final goal is to produce a virtuous feedback circuit for semantic enhancement based in machine learning.

متن کامل

A Model for Web Mining Applications – Conceptual Model, Architecture, Implementation and Use Cases

Web mining is a computation intensive task even after the mining tool itself has been developed. However, most mining software is developed ad-hoc and usually is not scalable nor reused for other mining tasks. This paper presents a Web mining model and implementation, referred to as WIM – Web Information Mining –, where rapid prototyping is possible. The underlying conceptual model of WIM provi...

متن کامل